Disambiguating Entities Referred by Web Endpoints using Tree Ensembles
نویسندگان
چکیده
This paper describes system details and results of team “EOF” from the University of Melbourne in the shared task of ALTA 2016, which addresses the use of cross document coreference resolution to determine whether two URLs refer to the same underlying entity. In our submission, we develop a two stage system which first identifies the underlying entity for a given URL using entity-level features by ranking the entity mentions present in the crawled text with the help of logistic regression. This is followed by disambiguating entities present in the given pair of URLs using a tree ensemble model to classify if both URLs refer to the same underlying entity. Our system achieved a final F1-score of 86.02% on the private leaderboard1, which is the best score among all the participating systems.
منابع مشابه
Evaluating the Effectiveness of Ensembles of Decision Trees in Disambiguating Senseval Lexical Samples
This paper presents an evaluation of an ensemble–based system that participated in the English and Spanish lexical sample tasks of SENSEVAL-2. The system combines decision trees of unigrams, bigrams, and co–occurrences into a single classifier. The analysis is extended to include the SENSEVAL-1 data.
متن کاملOntology-Driven Automatic Entity Disambiguation in Unstructured Text
Precisely identifying entities in web documents is essential for document indexing, web search and data integration. Entity disambiguation is the challenge of determining the correct entity out of various candidate entities. Our novel method utilizes background knowledge in the form of a populated ontology. Additionally, it does not rely on the existence of any structure in a document or the ap...
متن کاملDisambiguating Descriptions: Mapping Digital Special Collections Metadata into Linked Open Data Formats
In this poster we describe the Linked Open Data (LOD) for Digital Special Collections project at the University of Illinois at Urbana-Champaign and describe some of the particular challenges that legacy metadata poses for representation in LOD formats. LOD formats are primarily based on the World Wide Web Consortium’s Resource Description Framework standard which demands both that entities be n...
متن کاملAutomatic QoS-aware Web Services Composition based on Set-Cover Problem
By definition, web-services composition works on developing merely optimum coordination among a number of available web-services to provide a new composed web-service intended to satisfy some users requirements for which a single web service is not (good) enough. In this article, the formulation of the automatic web-services composition is proposed as several set-cover problems and an approxima...
متن کاملA Maximum Entropy Approach To Disambiguating VerbNet Classes
This paper focuses on verb sense disambiguation cast as inferring the VerbNet class to which a verb belongs. To train three different supervised learning models –Maximum Entropy (MaxEnt), Naive Bayes and Decision Tree– we used lexical, co-occurrence and typed-dependency features. For each model, we built three classifiers: one single classifier for all verbs, one single classifier for polysemou...
متن کامل